Practice with ggplot2

Where should I put the aes() bit?

If you put it at the “top level” inside ggplot(aes(...)), the mapping will apply to all levels. For example:

bears %>% 
  count(month) %>%
  ggplot(aes(x = month, y = n)) +
  geom_point() + 
  geom_line()

In contrast, if you put the aes() mapping inside a single geometry layer, it will only apply to that layer. For example, this will cause an error since the geom_line() part doesn’t have an aesthetic mapping:

bears %>% 
  count(month) %>% 
  ggplot() +
  geom_point(aes(x = month, y = n)) + 
  geom_line()
#> Error in `geom_line()`:
#> ! Problem while setting up geom.
#> ℹ Error occurred in the 2nd layer.
#> Caused by error in `compute_geom_1()`:
#> ! `geom_line()` requires the following missing aesthetics: x and y

Main geoms

geom_point()

Basic scatterplot:

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy))

Change color for all points:

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy), color = 'blue')

To change color based on a variable, map the variable to color in aes():

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy, color = class)) 

Map the shape instead of color (usually not a great idea):

mpg %>% 
  ggplot() +
  geom_point(aes(x = displ, y = hwy, shape = class)) 

What happened to SUV?

geom_line() vs. geom_smooth()

geom_line() connects all the dots:

mpg %>% 
  ggplot() +
  geom_line(aes(x = displ, y = hwy))

The reason this looks messy is because geom_line() is trying to literally connect every dot from left to right.

If you wanted a single “best-fit” trend line, use geom_smooth():

mpg %>% 
  ggplot() +
  geom_smooth(aes(x = displ, y = hwy))

Set se = FALSE to drop the error bounds:

mpg %>% 
  ggplot() +
  geom_smooth(aes(x = displ, y = hwy), se = FALSE)

geom_col()

For these examples, I’m creating a smaller summary data frame first that just counts how many rows there are for each class:

mpg %>% 
  count(class)
#> # A tibble: 7 × 2
#>   class          n
#>   <chr>      <int>
#> 1 2seater        5
#> 2 compact       47
#> 3 midsize       41
#> 4 minivan       11
#> 5 pickup        33
#> 6 subcompact    35
#> 7 suv           62

Basic bar plot of the counts:

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = class, y = n), width = 0.7) # width is width of bars

Re-order bars based on count using reorder():

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n), width = 0.7)

To change the color for all bars, use fill (not color):

mpg %>% 
  count(class) %>% 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n), fill = 'blue', width = 0.7)

To change color based on a variable, map the variable to fill in aes():

mpg %>% 
  count(class, drv) %>% # Note I had to include drv in the count too 
  ggplot() +
  geom_col(aes(x = reorder(class, n), y = n, fill = drv), width = 0.7) 

Use position = 'dodge' to change from stacked to side-by-side:

mpg %>% 
  count(class, drv) %>% # Note I had to include drv in the count too 
  ggplot() +
  geom_col(
    aes(x = reorder(class, n), y = n, fill = drv), 
    position = "dodge", width = 0.7) 

Practice

Facets

Facets make multiple small charts and are useful when you have many levels in a categorical variable.

For example, this plot has too many color categories for the color to be useful:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

Instead, we can use facet_wrap() to show multiple charts of each vehicle class:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  facet_wrap(~class)

You can also use facet_grid() to facet by two variables:

mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  facet_grid(drv ~ cyl)

Extra Practice

bears %>%
  count(year, gender)
#> # A tibble: 102 × 3
#>     year gender     n
#>    <dbl> <chr>  <int>
#>  1  1901 female     1
#>  2  1901 male       2
#>  3  1906 male       1
#>  4  1908 <NA>       1
#>  5  1916 male       1
#>  6  1922 male       1
#>  7  1929 female     1
#>  8  1929 male       2
#>  9  1930 male       1
#> 10  1932 male       3
#> # ℹ 92 more rows
mpg %>%
    mutate(manufacturer = str_to_title(manufacturer)) %>%
    group_by(manufacturer) %>%
    summarise(mean_hwy = mean(hwy))
#> # A tibble: 15 × 2
#>    manufacturer mean_hwy
#>    <chr>           <dbl>
#>  1 Audi             26.4
#>  2 Chevrolet        21.9
#>  3 Dodge            17.9
#>  4 Ford             19.4
#>  5 Honda            32.6
#>  6 Hyundai          26.9
#>  7 Jeep             17.6
#>  8 Land Rover       16.5
#>  9 Lincoln          17  
#> 10 Mercury          18  
#> 11 Nissan           24.6
#> 12 Pontiac          26.4
#> 13 Subaru           25.6
#> 14 Toyota           24.9
#> 15 Volkswagen       29.2